Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
                                            Some full text articles may not yet be available without a charge during the embargo (administrative interval).
                                        
                                        
                                        
                                            
                                                
                                             What is a DOI Number?
                                        
                                    
                                
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
- 
            Free, publicly-accessible full text available June 20, 2026
- 
            Free, publicly-accessible full text available March 1, 2026
- 
            Creating a biomedical knowledge base by addressing GPT inaccurate responses and benchmarking contextWe created GNQA, a generative pre-trained transformer (GPT) knowledge base driven by a performant retrieval augmented generation (RAG) with a focus on aging, dementia, Alzheimer’s and diabetes. We uploaded a corpus of three thousand peer reviewed publications on these topics into the RAG. To address concerns about inaccurate responses and GPT ‘hallucinations’, we implemented a context provenance tracking mechanism that enables researchers to validate responses against the original material and to get references to the original papers. To assess the effectiveness of contextual information we collected evaluations and feedback from both domain expert users and ‘citizen scientists’ on the relevance of GPT responses. A key innovation of our study is automated evaluation by way of a RAG assessment system (RAGAS). RAGAS combines human expert assessment with AI-driven evaluation to measure the effectiveness of RAG systems. When evaluating the responses to their questions, human respondents give a “thumbs-up” 76% of the time. Meanwhile, RAGAS scores 90% on answer relevance on questions posed by experts. And when GPT-generates questions, RAGAS scores 74% on answer relevance. With RAGAS we created a benchmark that can be used to continuously assess the performance of our knowledge base. Full GNQA functionality is embedded in the freeGeneNetwork.orgweb service, an open-source system containing over 25 years of experimental data on model organisms and human. The code developed for this study is published under a free and open-source software license athttps://git.genenetwork.org/gn-ai/tree/README.md.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                     Full Text Available
                                                Full Text Available